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Abstract 



Compares factor analysis and Rasch measurement. Shows how 
they: 1) address the same data, with different interpretations of 
numerical status, 2) use the same estimation method, with 
different measurement models, 3) solve the same problem, with 
different utility. Factor analysis is faulted for 1) mistaking 
stochastic observations of ordered labels as established linear 
measures and for 2) failing to construct linear measurement How 
to use Rasch measurement to replace factor analysis is developed 
for a dichotomy and shown for a rating scale example. 



Factor Analysis 

Input datum x^ is a test score, Likert rating or MCQ 
response of persons n=l,N to items (or tests) i=l,L. The raw 
data are expected to be sufficiently linear to allow equating 
incommensurable item origins and scales by subtracting -ocal 
means and dividing by local standard deviations. 

The sample standardized data for Factor 1 become: 



z 



nil 



§ with m t - Y,*ntt N s * " £ U - " m J V iN ~ 1] 

m n n 

Q 

W This item scale equating expects complete data. Only 

persons with usable responses to every item can participate. 
When data are missing, they must be feigned or incomplete persons 
(or items) deleted. Deleting persons alters the interpretation 
of the standardizing sample. Deleting items alters the 
construct. Pair-wise deletion to estimate correlations biases 
factors. 

The model fo\ Factor 1 is: 

*nu - u mVn - e nil e nil -N(0,ol) (2) 

{u„, n=l,N} is a vector of person "factor scores", 

predicted for persons by Factor 1. 
{V-, i=l,L} is a vector of item "factor loadings", 

' the regressions of {u,,,} on the data from items i=l,L. 

K 

Vtf 'A short version of this mss. waB delivered at the Mid-Western Educational 

^ Research Association Annual Meeting, October 13, 1994. 
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The residual from Factor 1 is: 

Znil ~ U nl V il - e nll ~ Z niZ < 3 > 

Whether this residual is all error e nll ~ N(Q ( a 2 ) 

or in addition to error implies other factors is presumably 
unknown. 2 When an additional factor is suspected, Factor 1 
residuals are used for seeking Factor 2 and so on. Matrices of 
decreasing residuals {{z^}} for j=2,M are extracted turn to 
calculate the M factor model (Thurstone, 1947): 



z 



nil 



M 

S UnjVij + 



e e ~N(0,o 2 ) (4) 



Since there is no objective basis for a "right" number of 
factors nor a "theoretical" value for "error" a 2 to factor down 
to, factor analysts default to conventions like: Stop when 
factor size (sum of squared factor loadings) becomes less than 
one. Stop when successive factor sizes level off. Stop after 
two or three factors, because anything more complicate^ is 
impossible to replicate. 

The simplest way to obtain optimal values for person and 
item vectors {u^} and {Vy} for each Factor j is to minimize: 

EE - <V^> 2 < 5 > 

n-l i-l 

This "direct factor analysis" (Saunders 1950, Cattell 1952, 
MacRae 1960, Wright and Evitts 1961) is a principal component 
decomposition (Hotelling 1933) of a "sample standardized" data 
matrix into j=l,M item vectors {v y i=l,L} and M corresponding 
person vectors n=l,N}. 3 The results are comparable to 
Thurstone centroids (1947) . 

Decomposition to identify each Factor j = 1,M is 
accomplished by initializing at u^—l for all n and iterating 
through equations: 

N L L 

v u - E u v - E ^WE v h ( 6 ) 

n-l i-l i-l 



renormed by C 2 - E U tf/ N u nj - u nj/ c so that E u ^ " N 

n-l n-l 

until successively smaller changes in {u^} become uninteresting. 



1 Should another factor be expected, the most efficient approach is to 
analyze each subset of expected-to-be-singular items separately and to defer 
comparing any resulting "variables" until their construct definition and 
quantitative representation is established, 

'Principle component decomposition is the core of most contemporary factor 
analysis and multi-dimensional scaling programs. 
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Standardized factor score is the value predicted by 
Factor j's regression on "independent" variables i = 1,L with 
regression coefficients {v y } . 

Factor j results are: 

a) Factor j Size (variance "explained") : 

G 3 ' £ V h with M<^G j < L (7) 

b) Factor j Loadings (regression coefficients) : 

N 

v ij " ]C u nj z rAj/ N of factor scores u^ on residuals {{z^}}. 

c) Factor j Standard Scores (zero mean, unit variance): 

i 

u nj - £ v^z^j/Gj predicted by regression coefficients {v^} 

i-l 

from residuals {{z^}}- 

Problems with Factor Analysis 

1. Raw data {{x^}} are never lin ear measures. Even test 
scores, unless transformed into logits, become increasingly 
non-linear as they near their finite limits. When x^ is a Likert 
rating it is not even cardinal! But ordinal data are not suited 
to Equations (1) through (5) and the factor scores they produce 
are necessarily non-linear. 

2. In view of the non-linearity, the "true score" error 
models of Equations (2) and (3) do deal with uncertainty in a 
useful way. 

3. The necessity for complete data is awkward. Data are 
never complete. 

4. After each factor is extracted, its residuals {{z^}} are 
the data for smaller factors. These residuals contain one sure 
effect, the turbulence left behind by the estimation of preceding 
factors. Intimations of smaller factors are necessarily awash in 
the residual wake of larger factors. 

5. Without a basis for anticipating a final "error" size for 
a 2 , there is no objective way to decide when to stop factoring. 

6. Software implementations seldom provide standard errors 
for factor loadings or factor scores. 

7. When a "same" set of items is refactored from a new 
sample of persons, neither factor sizes nor loadings are ever the 
same. Only the most generous fudging allows one to suppose their 
factor structure has been confirmed. 
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Most factor analysts swallow problems 1 through 6. But 
problem 7 is fatal. As the numerical instability of presumed 
"replications'' emerges, we are forced to retreat from non- 
reproducible numbers to nominal conclusions. Person factor 
scores are abandoned. All but the relative magnitudes of item 
factor loadings is ignored. The only use we make of the factor 
analysis output is to classify items according to their highest 
factor loading. Person scores for each category of items are 
obtained, not from factor scores (even when separate ref actor ings 
are done for each class of items) but by summing the original 
standardized (but inevitably non-linear) person responses z nii tc 
the items in a factor class. Whatever the distribution of item 
loadings in a factor class may suggest, all items are given equal 
weight in this summation of non-linear numerical labels. 



Rasch Factor Analysis 

Why not admit that our data are neither measures nor 
cardinal numbers but necessarily begin as nothing more than 
labels {{c^}} for nominal qualities - labels which may respond to 
an intelligent ordinal scoring x^^ 0,1,2,3,, to produce the 
ordinal score matrix {{x^}}? 

Familiar examples are: rating scales like (strongly agree > 
agree > disagree > strongly disagree) which can usually be scored 
x,,; = 3,2,1,0 or at least x^ = 2,1,0,0 and MCQ options like 
(right > wrong) which can always be scored x„; = 1,0. Even raw 
scores (r+1 > r > r-1) can respond to an ordinal interpretation. 



Why not embrace the inescapable initially nominal but 
possibly ordinally interpretable status of datum x^ and address 
it directly for what it is with a probability model for the 
occurence of ordered categories (Rasch 1961, Andrich 1978, Wright 
and Masters 1982)? 

First, we will show the algebra of this approach in its 
simplest form, the use of x ni =0,l to represent a dichotomy through 
which nominal events interpreted as signifying "more" of an 
intended variable are scored "1" and nominal events signifying 
less are scored "0" (Rasch 1960/1980/1992, pp 62-124). 

Then we will illustrate the empirical similarities and 
differences between factor analysis and Rasch measurement by 
analyzing the responses of 2049 public school teachers to a 
"Strength of Principal Leadership" rating scale. 
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To begin the algebra for x,~0,l we will not mistake the 
number labeling as a linear measure, but will, instead, recognize 
it for the binomial it is - a stochastic binomial for which we 
have decided, on the basis of our measuring intentions, what kind 
of events are "better" for our purposes than others, i.e. what we 
decide qualifies as a "right" answer. To set up a counting 
system for this preference, we label the preferred ("right" 
answer) event "l" and it's absence ("not right" hence "wrong" 
answer) "0". 

The error model which follows is not the ill-suited linear 
error (true score) model of factor analysis (Equation 2) but a 
binomial probability model P,^ for the occurrence of 3^=0, l 
(Equation 8) which, because of its formulation (Rasch i960) : 

1) Obtains the parameter separability necessary for 
constructing objective (additive conjoint) measurement (Luce 
and Tukey 1964, Perline, Wright and Wainer 1979) and 

2) Has Fisher sufficient statistics (1922) for linear person 
measures B B and item calibrations D; which combine additively 
as (B n -Di) (and therefore construct the linearity we need for 
subsequent quantitative analysis) to govern the 
probabilities of x^ = 0,1 

The necessary and sufficient model is: 



lo 9 P nii/Pnio) ' Bn-Vi X ni - 0 , 1 (8) 

from which 

Parameters for this model, as with factor analysis (5), are 
estimated by minimizing: 

n-1 i-l 

but now i?^ - P nll - exp (B^) / [1+ exp (B^D^ ] (9) 



rather than E zni - L?^V^ 
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With this Rasch approach to x, we get not only lew ^-square 
(f . hp iffiDiementation here for maximizing likelihood) estimates of 
linear person measures B n and linear item calibrations Dj along 
the single common measurement line (variable which they 
conjointly define, but also a stochastic basis, the binomial 
error variances p^, for estimating relevant standa ^ .rra. 
for B n and D s and for evaluating the probabilities of residuals. 

Vni " Xnl-PniX Z ni " PnixPniO E znl ~ 0 V znl - 1 (10) 

This enables a detailed misfit analysis which, in turn, 
allows a partition of the full matrix of residuals {{z^}} into 
those man? z* which are observed to be no greater than the 
probability model expects, say \z^\ < 2, and hence, in all 
probability", of no immediate empirical interest and the, 
oossiblv interesting, subset of remaining, more extreme 
SSiduals^ Bay 1**1 > 3, which are sufficiently improbable to 
Invite further^ investigation and reconsideration as possible 
evidence of a second variable. 

It is only when improbable residuals emerge that there is an 
empirical incentive to look for a presumably unexpected second 
variable. 4 

Should we decide to venture further, the improbable 
residuals tell us exactly where to look. No need to accumulate 
confusion by wallowing through the full matrix {{z^,}} or by 
subsequent factor rotltions which are, after all directed by the 
largest residuals. We already knor from the residuals in hand 
which particular items (and also which persons) do not fit into 
the construction of the first variable. Should the: f e be another 
useful, albeit unexpected, variable in these data, it will be 
most directly accessible among the original (rather than 
residuallperson responses to the subset of items which misfit 
the first analysis. 

To seek a second variable j=2, therefore, we concentrate on 
just the data for the subset of misfitting items. 5 We apply the 
Rasch probability model again, not to the whole matrix of 
residuals {{z*}}, but only to the submatnx {{x^ie^}} of 
original ordinally interpreted responses of persons to this 
subset of items. We estimate a new set of linear item 
calibrations {D Q } for just these misfitting items a 
new set of additive conjoint person measures {B^} on this newly 
defined "Variable 2". 



« in a sensible research, of course, a "theoretical" Incentive would 
^ , * \ fr^fv fr-l" results There would be a set of well-desi.gned items 
whTcn^riitend^^ deffn" a' Vingle" variable. The "empirical" question would 
narrow lo f inding out whether any of these well-intended items failed to perform 
ulefSlly and, if so, with which particular persons and possibly "Why?". 

3 The focusing provided by identifying items manifesting improbable 
residuals is analogous in purpose and consequence to the item clustering by which 
factor rotation is guided. 
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To find out whether unexpected Variable 2 brings out any new 
information, we plot Variable 2 person measures {B^} against 
Variable 1 person measures {B B ,}. The shape of this plot shows us 
in detail the extent to which we have found a useful second 
variable and also for which of these persons it may actually 
provide new information. 

Because we have independent standard errors for each linear 
measure on each variable, the statistical status of the 
differences (B^ - B o! ) for each of the n=l,N persons can be judged 
objectively by comparing them with their estimated standard 
errors: 

B M ) /{SE 2 D2 + SE 2 nl - - N(0,1) (ID 

Extension of the dichotomous Rasch model to the ordered 
response categories x„~G, 1,2,3, , ,m, , ,« of rating scales, partial 
credits, grades, ranks, raw scores, counts and to models with 
additional facets for raters and tasks is straightforward 
(Wright & Masters 1982, Linacre 1989). 



An Empirical Example 

We will illustrate the empirical similarities and 
differences between factor analysis and Rasch measurement by 
using both methods to analyze the responses which 2049 Chicago 
public school teachers gave to the 13 item "Strength of Principal 
Leadership" rating scale on page 8 of the Consortium on Chicago 
School Research questionnaire "Charting Reform: The Teachers' 
Turn, 1994". 6 

The 13 rating items in Figure 1 were written to define a 
single line of inquiry which produced a single measure of 
perceived "Strength of Principal Leadership" for each of the 2049 
teachers. The methodological question then is: Are these 13 
items used by these 2049 teachers in a way that enables the 
construction of a reasonable and useful single measure? 

[Figure 1] 

The three items about to be exposed as diverging from the 
coherent core defined by the other 10 are marked [A], [B] and [C] 
in Figure 1. 

Figure 2 is the factor analysis scree plot of principal 
component eigenvalues for these data. Some factor analysts might 
conclude, at this point, that the 13 items work together well 
enough to define a single 13 item factor and stop. The scree 
plot, however, does hint that components [2] and [3] may be a bit 
too large. 



*The research from which these data come is supported by the Consortium on 
Chicago School Research under the direction of Anthony S. Bryk and Penny Sebring. 
The factor analyses were done with SAS by Stuart Luppescu. The Rasch measurement 
analyses were done with BIGSTEPS by Winifred Lopez. 
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[Figure 2] 

Fioure 3 is a Rasch measurement item misfit plot for the 
same data! Here the salient misfit of items [A], [B] and [C] is 
unequivocally distinct. 

[Figure 3] 

Figure 4 is the table of varimax factor loadings . Factor 
analvstlwho rotate these data will not miss the exceptionality 
of items Ta], [B] and [C] and, after studying their item text, 
will have some useful ideas as to why these three items might not 
follow the mainline defined by the 10 item core. 

[Figure 4] 

Fiaure 5 is the Rasch measurement item calibration table 

listed SSX in misfit and in measurement orders. Here we see for 
each of the 13 items not only its fit statistics in mean square 
and standardized form but also its raw score pomt-biserial and 
relative "difficulty to be agreed with- , its "unpopularity" as it 
were. 

[Figure 5] 

Figure 5 shows that items [B] and [C] are 0.4 logits harder 
to aaree with (1.04 and 0.99 versus 0.61 logits) than the next 
"hardes?" item Their texts share a close, personal supervision 
of tttcLr by principal. This supervision could be supportive 
but it is more likely to be restrictive. Indwrittj .£ rb 
"supervises" is viewed by many as counter-collaborative. Thus 
these iJems are not only hard to agree with but also ambiguous 
with resoect to the spirit of increasing egalitarian 
collaboration which dominates the hierarchy of the 10 core items. 

Item TAl, "Principal makes all final decisions." at the 
other end of the line (at -0.77 logits) is the i tern easiest to 
agree with but also emphatically counter-collaborative. 

Once the text of these three items is examined and 
understood in terms of the pro-collaborative tenor of the 10 core 
items, it is easy to agree with both factor analysis and Rasch 
measurement results and to remove these three items «™ Jha 
definition of this "Strength of Principal Leadership" variable. 

Indeed, at this point we may find ourselves wondering why we 
included these three items in the first place. We may also 
wSnder? now that we see the construct evolution of the variable 
defined bv the 10 core items, whether it would not be useful to 
rename this variable "strength of Principal Support for 
Collaboration". 

Do you see how useful the Rasch measurement information in 
Fig-ire 5 can be for confirming a decision to set aside the three 
aberrant items and to identifying the construct evolved by the 
hierarchy of the 10 core items? Other parts of the standard 
Rasch measurement output are equally useful. 
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Figure 6 maps both 2049 teachers and 10 core items onto the 
single line of inquiry that the 10 items define. The line of 
inquiry rises from low agreement at the bottom of Figure 6 up to 
high agreement at the top. On the left, each teacher is located 
at their level of agreement. On the right, each item is located 
first at its level of transition from "strongly disagree" to 
••disagree", then at its level of transition from "disagree" to 
"agree" and finally, on the far right, at its level of transition 
from "agree" to "strongly agree". 

[Figure 6] 

Figure 7 shows the same data in a different form. Now the 
line of inquiry is drawn to increase from left to right. The 
exact count of teachers at each level of agreement is given along 
the bottom of the upper figure. We can see that 27 teachers at 
the top of Figure 6 and at the far right of Figure 7 have 
"strongly agreed" with all 10 items. We can also see that the 
modal group of 333 teachers at a measure of about 1.3 logits is 
above the disagree-to-agree transition of even the hardest to 
agree with item. And we can see that 59 teachers at the far left 
of Figure 7 have claimed to "strongly disagree" with all 10 
items. 

[Figure 7] 

Figure 7 also shows us something of considerable importance 
to our understanding and application of this rating scale as it 
was used by these teachers. The spacing beween rating scale 
categories (1) to mark "strongly disagree, (2) to mark 
"disagree", (3) to mark "agree" and (4) to mark "strongly agree" 
is not uniform (equal) as a Likert interpretation would have it. 
Values for the expected difficulties of each step are given in 
the table at the bottom of Figure 7. The estimated increase in 
difficulty from "strongly disagree" at step (1) and "disagree" at 
step (2) in expected step measures is -1.74 -(-3.74) •= 2.0 
logits. But the estimated increase from "agree at step (3) and 
"strongly agree" at step (4) is 4.57 - 1.28 « 3.29 logits. The 
second distance is 1.65 times greater than the first. We see 
that it is tangibly easier to move up from "strongly disagree" to 
"disagree" and so reduce strong disagreement, than it is to move 
up from "agree to "strongly agree", and so produce strong 
agreement. Figure 7 also shows us why we might prefer to avoid 
factoring Likert scores like 1,2,3,4 as though they were equally 
spaced measures. 

Rasch measurement also shows us useful information about 
each of the 2049 teachers. Figure 8 begins this part of the 
Rasch analysis by showing the distribution of teacher response 
pattern misfit. We see that a substantial number of teachers are 
using these 10 items idiosyncratically. Subsequent pages of 
output identify these teachers and show o.j which items they 
presided surprising ratings. These diagnostic outputs show us 
how teachers use the 10 items and differentiate the many teachers 
whose responses are sufficiently coherent to produce a valid 
measure of perceived "Strength of Frincipal Support for 
Collaboration" from other teachers whose response patterns make 
unique individual statements. 

Copyright* 10 26 94 B.D.Wright, MESA, 5835 Kimbark, Chicago 60637 9 



[Figure 8] 



Figure 9 shows the exact and non-linear relationship between 
factor scores and Rasch measures for these 2049 teachers. Since 
the Rasch measures are modelled to be linear and the Rasch fit 
analyses expose any failures of data to support the linear 
measure construction based on this modelling, it must be the 
factor scores which are not linear. 

[Figure 9] 

Figure 10 summarizes Richard Smith's use of two-factor 
simulated data to evaluate how well factor analysis and Rasch 
measurement detect a second factor. Smith finds that factors of 
equal size can only be discerned when they are uncorrelated R<.3. 
Against that kind of data factor analysis does better than Rasch 
measurement . 

[Figure 10] 

Against all other kinds of data, however, particular the 
kind of data most frequently encountered in social science 
research in which the factors are NOT of equal size and NOT 
uncorrelated, i.e. the usual situation where there is first an 
intended dominant factor and then an unintended off-shoot, 
correlated with the first factor and less well represented, Rasch 
measurement does better. 

Finally, Figure 11 collects into one summary table the 
considerations which bring out the similarities and differences 
between factor analysis and Rasch measurement. 

[Figure 11] 
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Figure 11 
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